A MEASURE-THEORETIC VERSION OF THE DRAGOMIR-JENSEN 

INEQUALITY 



J. M. ALDAZ 

Abstract. We extend Dragomir's refinement of Jensen's inequality from the dicrete to tfie 
general case, identifying the equality conditions. 



1. Introduction 

Suppose we have a probability space {Q,A,P), an integrable function X : f2 — )■ M, and a 
real valued strictly convex function defined in some interval containing the range of X. By 
Jensen's inequality, 

(1) E(j){X) -(j){EX)>0, 

with equality precisely when X{u}) = E{X) for P-a.e. u. Thus, the left hand side of ([1]) 
provides a measure of the spread of X around its mean value. Of course, in the important 
special case (j){t) = t"^, the left hand side of ([1]) is just the variance of X. It is natural to study 
how these generalized variances change when the probability P varies. The case of discrete 
probability measures with finite support was considered by S. S. Dragomir in [Draj . 

Here we note that Dragomir's clever proof (refining Jensen's inequality by using Jensen's 
inequality itself) can be used to extend his result to general probability spaces. Additionally, 
we identify the cases of equality when (p is strictly convex, and present some immediate 
consecuences. 

2. The Dragomir- Jensen Inequality 

To motivate the inequality, consider the following simple example: Let X and Y be random 
variables on {Q,A,P), with densities fx and /y respectively, such that the quotient fy/fx 
is well defined almost everywhere, or in other words, such that the distribution of Y, defined 
by the push-forward (or induced) probability Y^P{A) := P{Y G A), is absolutely continuous 
with respect to X^,P(A) := P{X G A). If both X and Y have mean zero, then it is easy to 
bound the variance of Y in terms of the variance of X: Since 

(2) Var(Y) = H y'fy{y)dy = H y' ^ fx{y)dy, 
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and Var(X) = J^^y^ fxiy)dy, replacing 
essential infimum, we get 



m 



(3) 



ess inf 



iyy\ 



fxiy)J 



Var(X) < Var(F) < 



by its essential supremum and by its 

fviy) 



fxiy) 



Var(X). 



In fact, ([2]) holds whenever the expected values of X and Y are finite, and not just for 
(j){t) = t^, but for arbitrary convex functions. This is the content of the (two sided) Dragomir- 
Jensen inequality (cf. Theorem 12.11 and Corollary 12. 9p . which generalizes ([3]) much in the same 
way as Jensen's inequality generalizes Var(X) > 0. 

Next we recall some basic notations and facts. Given an absolutely continuous measure 
P « Q, as usual, ^ denotes the Radon-Nikodym derivative of P with respect to Q, 



and 



dP 

dQ 



denotes its essential supremum (which could be infinite). We adopt the standard 

measure-theoretic convention oo ■ = (recall that under any other convention, the monotone 
convergence theorem would fail). Of course, integrals are used here in the Lebesgue sense. In 
particular, to have / XdP well defined, it is assumed that either J X^dP < oo or J X~dP < 
oo, where X~^ and X^ respectively denote the positive and negative parts of X. Thus, we do 
not consider principal values. 

Since we will be dealing with positive measures, and in fact, probabilities, by taking any 
representative of the Radon-Nikodym derivative and redefining it (if needed) on a set of 
measure zero, we may assume that < ^ < oo (we use the same notation for the Radon- 
Nikodym derivative and its representative). 

Theorem 2.1. Let {fl,A) be a measurable space, let P and Q be probability measures defined 
on {Q,A) such that P << Q, let X : Q ^ be integrable both with respect to P and Q, and 
let the real valued function (j) be convex in some (not necessarily bounded) interval containing 
the range of X. Then 

dP 



(4) 



(f){X)dP 



XdPj < 



dQ 



<p{X)dQ 



XdQ 



Regarding the equality conditions, to avoid trivialities we suppose that P ^ Q, and then we 
distinguish three cases. 

1) Both sides of take the value oo if and only if J^(f){X)dP = oo, and then either 
J^(p{X)dQ = oo or % =oo. 



{ujen: f (c) 



dP 




dQ 


oo ^ 



Next, assume that (j) is strictly convex, and let A :- 

2) Both sides of ^ take the value if and only if X is Q-a.e. constant. 

3) Both sides of ^ take the same value a, with < a < oo, if and only if the following 
three conditions hold: 

a) J^(f){X)dQ < oo and ^ < oo. 

^ oo 

b ) There exists a constant c such that X 
on Q. 



c, Q-a.e. onfl\A, but X is not Q-a.e. constant 
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c) Q{A) > 0, P{A) > 0, and c = J^XdQ = J^XdQ = j^XdP = J^XdP. 

Example 2.2. Given {n,A,P), A C X with P{A) > 0, and X : M integrable, it 

is intuitively obvious (and not difficult to prove directly) that the variance Var^(X) of X 
restricted to A, with respect to the conditional probability Pa{B) := P{B\A), cannot be 
much larger than the original variance of X on all Q, with respect to P. This is now a special 
case of the preceding theorem: Since = pp^5 setting (j){t) = in (j4]) we get 

,,,, Var(X) 

Example 2.3. Given X : ri — )■ R and : R — t- R, suppose we know the distribution of X, 
but there is some uncertainty about the value of one or several parameters, and we want to 
estimate how the variance of g{X) is affected by this uncertainty. The preceding theorem can 
be used to this end. 

Assume, for instance, that X ~ X(0,cr), with < a < a < 6. The zero mean assumption 
is made so the resulting expressions are simple, but the case where there is uncertainty both 
about the mean and the variance can be treated in the same way. Call Vari(5f(X)) and 
Var2((7(X)) the variances obtained by supposing that X has standard deviations Ui and (T2 
respectively, with a < ai < a2 < b, and let P and Q be the corresponding laws for X. Then 

dP 



dP 02 
— - = — e 12 , so 



dQ 



(To b 

— <-■ 

Oi a 



and hence 

Vari((7(X))<-Var2((7(X)). 
a 

If we additionally know that g is compactly supported, we can present a simultaneous lower 
bound (in this regard, see also Corollary 12.91 below) . Suppose, for simplicity, that the support 
of g is contained in [c, d\ , with Q < c < d. Then 

= llie < , so Var2(^(X)) < — e Vari(^(X)), 

dP 0-2 0"2 



and hence 



<^2 o„i„-i 



e Var2(^(X)) < Vari(^(X)). 



Remark 2.4. It is natural to ask whether the hypothesis P « Q in the theorem, must be 
imposed on all of fi, or it is sufficient to consider just the set {X 7^ 0}. To see that this is 
not enough, let VL = [0, 1], let A be the Lebesgue sets, and let dP = dx. Take X = X[i/2,i]5 
and dQ := 2x[i/2,i]dx. Then ^ = i on {X 7^ 0}, so restricted to {X 7^ 0}, P « Q and ^ 
is bounded. Let (^(x) = x^. Since X is constant a.e. with respect to Q, the right hand side 
of (jl]) is zero, while the left hand side is just 1/2 — 1/4. 

Even if we extend ^ to all [0, 1] by setting ^ = 00 on (0, 1/2), the 00 x = convention 
entails that (jlj) does not extend to pairs P, Q when P has a singular part. 
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Remark 2.5. The equality conditions, for the original Dragomir's result, where established 
in In the more general measure-theoretic setting, dealing only with bounded functions 
would be too restrictive, so the possibility that some term equals oo must be considered. 
Related to this issue is the fact that, since we are working on a probability space, the U' 
norm of any function is monotone increasing in p. It is thus natural to ask whether the 

Dragomir- Jensen inequality can be improved by replacing 



dP 
dQ 



with C„ 



dP 
dQ 



for some 



p < oo and some constant Cp > 0. We show that p 
p > I. Let Q be Lebesgue measure on (0, 1] and let 



oo is optimal. Fix p < oo with 
(x) = x2. Set X{t) = t-i/2+i/(4p) 



and dP{t) := Ct ^^^'^^^dt, where the constant C is chosen so Ct ^/(^p) is a density. Then 
X G L\P) n L\Q), (l){X)dQ < oo, (j){X)dP = oo, and ^ 



dP 

dQ 



< OO, SO 



(t){X)dP 



XdP 



oo > c 



dP 



dQ 



(piX)dQ - 



XdQ 



On the other hand. 



dP 

dQ 



= OO and < f^(t>{X)dQ — (p (^J^XdQ), so this is an instance 
where both sides of the Dragomir- Jensen inequality take the value oo. 

To make the structure of the proof of the theorem more transparent, we place some measure- 
theoretic details into two technical lemmas. 

Lemma 2.6. Let {Q,A) be a measurable space, let P and Q be probability measures defined 
on (^l,A) such that P « Q, let ^ be essentially bounded, and let 



A :-- 



dP 



Then for every measurable set B G A, P{B) 
P{A) = if and only if Q{A) = 0. 



dQ 

if and only if Q{B) 



0. In particular. 



Proof. Since both P and Q are probabilities, 
Q, so the result follows. 



dP 
dQ 



dP 
dQ 



> 1. But restricted to A, P is just 

□ 



Lemma 2.7. With the notation and under the assumptions of the preceding lemma, let the 
measurable space {Q,A) be the enlargement of {fl,A) obtained by adding a new point y to Q 
and declaring it to be measurable. That is, y ^ Q, Cl := Q U {y}, and A := AU {B U {y} : 
B G A}. Denote by 6y the Dirac point mass at y. Then the set function P, defined on {Q,A) 
by 



(5) 



P{B) := Q{BnQ) 



dP 



dQ 



P{BnQ) 



dp 



dQ 



6y{B), 



is a probability. Furthermore, P{A) = 0, and for every measurable set B (Z Q \ A, P{B) 
if and only if Q{B) = 0. 
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Proof. Select any w & Q, identify Q with Q x {0}, and set y := w x {1} (this ensures that 
y is actually a new point). Then let f2 := ^7 U {y}, and let ^ := ^ U {B U {y} : B G A}. 
Since A contains 0, is closed under complementation, and also under countable unions, it is 
a cr-algebra. Additionally, {y} is measurable, for G ^, whence {y} = U {y} G A. 

Let us check that P is a probability. Clearly P{^1) = 1; to see that P is non- negative, 
we use the change of variable formula J^XdP = J^X^dQ, and we conclude that for every 

-B G we have 



PiB)> 



Bnn 



dP 



dQ 



Note next that whenever i? C ^2 is measurable, (|5]) reduces to 



(6) 

In particular, A :- 

P{A) := Q{A) - 



P{B) = Q{B) - 



dP 



dQ 



P{B) < Q{B). 



wen: %{w) 



dp 




dQ 


oo ^ 



has P-measure zero, since by ([6]) 



dP 



dQ 



-1 



P{A) = Q{A) 





dP 


L 


dQ 



-1 



dp 

dQ 



Finally, to see that for any measurable B G Q \ A, P{B) 
observe that if Q{B) = 0, by © we have P(P) < Q(P) 



dQ = Q{A) - Q{A) = 0. 

= if and only if Q{B) = 0, 
0, while if Q{B) > 0, since 



dP 
dQ 



dP 
dQ 



> on Q \ A, we have 



PfP) 




dP 



dQ 



-1 



dP 

dQ 



□ 



oo) then the right 



Proof of the Theorem. Note that if ^ is unbounded (i.e., if 

hand side of (jl]) is oo unless (j){X)dQ = [J^ XdQ) , in which case its value is zero (by the 
standard convention). So for (jl]) to hold we need to have f^(j){X)dP = (^f^XdP). This 
would be trivial if were strictly convex, for then X would be constant Q-a.e., and thus 
also constant P-a.e. since P << Q. But for not strictly convex, equality may occur for 
non-constant functions, and so a different argument is needed. What we do is to assume first 

that ^ < oo, and then handle the unbounded case via an approximation argument. Let 

us suppose also that both J^(f){X)dQ < oo and J^(f){X)dP < oo. Then inequality 
equivalent to 



IS 



(7) 



XdQj < 



<p{X)dQ 



dP 



dQ 



-1 



(p{X)dP 



dP 



dQ 



-1 
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simply by rearranging terms. But ((71) immediately follows from the usual Jensen's inequality, 
applied to the probability measure space A, P) defined in Lemma 12.71 and to a suitable 
extension of X, given hy X := X on fi, and X{y) := j^XdP. The function X : 17 — R is 
clearly A measurable. Furthermore, 



XdP 



XdP+ / XdP 

n J{y} 



XdQ- 



dP 



dQ 



XdP^ 



dP 



dQ 



[ ( [ XdP^ d6y 
) J{y} \Jn J 



(9) 

so by Jensen's inequality on {Q,A,P), 



(10) 



XdQ] 



(t){X)dQ 



dP 



XdQ, 



XdP < 



dQ 



(t){X)dP + 



dP 



X]dP 



dQ 



{y} 



X ] dSy. 



It follows from X{y) = XdP that /^^^ (x) d6y = [X{y)) = (/^ XdP), so ([7D holds 



dP 
dQ 



and hence so does (j4]) when every term appearing there is finite. Suppose next that 

oo, and that (f){X)dQ = (J^ XdQ^ (for otherwise (jl]) is trivial). Let us emphasize that we 
do not a priori assume that (j){X)dP < oo; this will follow once (j){X)dP = [J^ XdP) is 
proven. Note however that f^{(j){X))^dP < oo, since j^(t){X)dP > (^f^XdP) > — oo. De- 
fine dPn ■= min """I dQ, and observe that by the monotone convergence theorem, applied 

to min n^, and separately to X+ min n^, X~ min ?^|, (0(-^))''' niin ?^|, 

and (0(X))~ min raj, we have 

limP„(n)=lim f min(^,nldg= / ^rfQ = p((]) = 1, 
" " ydQ ] JndQ 



lim 



and 



Since 



lim 



0(X)rfP„ 



X§-dQ 
n dQ 



(p{X)dP. 



XdP, 



dP„ 



dQ 

< 



" -Pn(fi) Jn Jn 
n, we know from Jensen's inequality and the previous case that 

dP 



< n 



<p{X)dQ 



XdQ] 



0. 
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Thus, for every n we have 
(12) 



0(X) 



dPn 



X 



dPn 



Since all the limits involved exist, letting n oo and using the continuity of in the interior 
of its domain /, we conclude that 



(13) 



/ (j){X)dP = (p( [ xdp] , 
Jn \Jn J 



(this is stated below as a Corollary, as it seems to be of independent interest). Note that the 
interval / might contain one (or both) of its endpoints, say it contains the endpoint a, and 
might be discontinuous there. But if XdP = a, since the range of X is contained in /, we 
have X = a P-a.e., and thus f[T^ also holds in this case. 

So far, we know that if J^(j){X)dP < oo and J^(f){X)dQ < oo, then inequality (jl]) holds, 
regardless of whether or not ^ is essentially bounded, and additionally, that if (f){X)dQ = 
4> {Jq XdQ) , then (j){X)dP = (/q XdP^ . Furthermore, it is clear that if (j){X)dP = oo 



and 



dP 
dQ 



dP 

dQ 



< oo, we must have (f){X)dQ = oo, so all we need to show is that 

oo whenever (f){X)dP = oo and (j){X)dQ < oo. This latter inequality, together with 
j^(j){X)dQ > (f) [J^XdQ) > —oo, entail that both the positive and negative parts of 0(X) 

are Q-integrable. Suppose, towards a contradiction, that 



dP 
dQ 



< oo. Then 



^{X)dP < 



4>{X) 



dP 
dQ 



dQ < 



dP 



dQ 



\(p{X)\dQ < oo. 



In view of the fact that inequality @ holds under no restrictions, the first equality case, 
where both sides take the value oo, follows immediately. 

Next, suppose that is strictly convex. By the equality case of Jensen's inequality the 
right hand side of (jl]) is zero (hence, so is the left hand side) if and only if X is constant 
Q-a.e.; thus, statement 2) of the theorem is true. 

Finally, suppose there exists a constant a G (0, oo) such that 



(14) 



(p{X)dP - 



XdP 



) 



dP 



dQ 



4>{X)dQ - 4> (^j XdQ^ 



Then a) is immediate; using (jl]) we conclude that J^(f){X)dP is also finite, so equality on 
(jl]) is equivalent to having equality on (j7]), which in turn is equivalent to (^J^XdP^ = 

j dP. Now X is not constant Q-a.e. on Q (by the equality case in Jensen's inequality, 

otherwise a would be zero) but X is constant P-a.e. by the equality case in Jensen's inequality, 
call this constant c. Since by Lemma 12.7] P and Q are mutually absolutely continuous over 
the set n \ y4, statement b) follows. 
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To see why c) holds, note first that if Q{A) = 0, then by b) X = c Q-a..e. and we are in 
the case = 0. But a > 0, so Q{A) > 0, and hence P{A) > by Lemma [221 Since P Q, 
we also have P{Q \A)>0. Using X = c P-a.e. together with (I8])-(|9]), we see that for P-a.e. 
(or Q-a.e.) w E Q\A, 

(15) [ XdQ = I XdP = c = X{w) = X{y) = [ XdP. 

Jn Jn Jn 

Since X = c a.e. (with respect to all measures under consideration) on \ A, 



XdQ + / XdQ = c(l - Q{A)) + / XdQ = / XdQ 
n\A J A J A Jn 



so 



The fact that 

is obtained in the same way. Alternatively, since XdQ = XdP and X = c a.e. on X\A, 
we see that f fT6|l holds if and only if f lTTj) does. 

Suppose now that a), b) and c) hold. Using a) and b) we conclude that the right hand 
side of (jl]) is neither zero nor infinity. Since all terms involved are finite, equality on is 

equivalent to ^/^ XdP^ = f^(f) j dP, which holds if X is P-a.e. constant on fi. We 

prove this next. First, X = c P-a.e. on f2 \ A by b) and Lemma [2.71 while by definition and 
by c), X{y) = J^XdP = c. Since P{A) = by Lemma [23 the result follows. □ 

The next corollary was obtained as a step in the preceding proof, and of course, it also 
follows directly from the statement of Theorem 12.11 

Corollary 2.8. Let {Q,A,Q) be a probability space, let X : Q ^ be integrable, and let 
(f) be real-valued and convex in some interval containing the range of f. If (p (^J^XdQ^ = 
J^(f){X)dQ, then for every probablity P « Q such that X G L^{P), we have (p [J^XdP) = 
j^<P{X)dP. 

Proof. By g]), since (/)(X)rfP - (/^ XdP) > 0. □ 

We state next the version of Theorem 12.11 (where for simplicity we omit the analogous 
equality conditions) that provides a lower bound (instead of an upper bound) using the 
essential infimum of the Radon-Nykodim derivative (instead of the essential supremum). 
Recall that if P and Q are mutually absolutely continuous and h is any representative of 

then l/Zi is a representative of 
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Corollary 2.9. Let {Q, A) be a measurable space, let P and Q be probability measures defined 
on {Q,A) such that P << Q , let X : Q ^ be integrable both with respect to P and Q, and 
let the real valued function be convex in some interval containing the range of X. Then 

ess inf^^ (^j ^{X)dQ - ^ (^j XdQ^^ < j 4>{X)dP - 4> (^j XdP^ . 

Proof. If ess inf ^ = 0, then inequality (ITHj) reduces to the usual Jensen inequality, so only 
when ess inf ^ > does f|T8|) say something new. But in this case we have Q « P, with 
^ = ^py^Q a bounded function, since = infi^ ^ inequality (ITSll follows 

from (jlj): Multiply both sides of ([18]) by ||^||^, and note that this is just with the roles 
of P and Q interchanged. □ 

Since in the nontrivial case ess inf ^ > the preceding corollary reduces to Theorem 12.11 
the corresponding equality conditions follow automatically. 

If is concave, then applying Theorem 12.11 and Corollary l2.9l to —(f) we obtain the following 

Corollary 2.10. Let {Q,A) be a measurable space, let P and Q be probability measures 
defined on {Q,A) such that P << Q, let X : Q ^ M. be integrable both with respect to P and 
Q, and let the real valued function be concave in some interval containing the range of X . 
Then 

(19) (^ess inf (^^ { f Xdo] - [ 0(X)rfg^ < 

(20) ( / XdP J - / 0(X)rfP < 



n / Jn 
dP 



dQ 



(^0 Q XdQ^ - J <P{X)dQ 



Next we state the corresponding refinement of the measure-theoretic version of the arithme- 
tic-geometric mean inequality exp \og{X)dP < XdP, thereby generalizing [Xlt Theorem 
2.1]. 

Corollary 2.11. Let {Q,A) be a measurable space, let P << Q be probability measures 
defined on {Q,A), and let X : fl ^ [0, oo) be such that logfl is integrable both with respect to 
P and Q. Then 

(21) (^essinf^^ (^j XdQ -exp j \og{X)dQ^ < 



(22) / XdP -exp / log(X)rfP< 

In Jn 



dP 



dQ 



XdQ -exp / \og{X)dQ 
n Jn 



Let us finish by saying that the reader interested in applications of the Dragomir- Jensen 
inequality to information inequalities, can find some such applications (for the discrete case) 
in the original Dragomir's paper |Dra] . 
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